Skip to content

issues/178: fix fallback#179

Merged
paddybyers merged 7 commits intomasterfrom
tpr/fixfallback
Oct 13, 2016
Merged

issues/178: fix fallback#179
paddybyers merged 7 commits intomasterfrom
tpr/fixfallback

Conversation

@trenouf
Copy link
Copy Markdown
Contributor

@trenouf trenouf commented Oct 11, 2016

Fixes
#178

Tim Renouf added 4 commits October 10, 2016 10:30
Also requires changes to Http.java itself to allow its tests to be in a
different package.
Also requires a change to Defaults.java to allow its tests to be in a
different package.
Set the http host to try and ensure that realtime and rest use the same
region:
 - if we're on the default realtime host, set http to the default rest
   host
 - otherwise (the realtime host has been overridden or has fallen back),
   set http to the same as realtime.

Previously this was done on connection initiation. This commit changes
it to happen on successful connect.
Fallback was not working after a ConnectionManager timeout. Fixed.

This commit also ensures that the ConnectionManager default errors have
http status codes, as required for the fallback fix.
@trenouf
Copy link
Copy Markdown
Contributor Author

trenouf commented Oct 11, 2016

I've got an intermittent failure in RealtimeConnectFailTest.connect_fail_suspended. Not sure if it was happening before my changes.

@trenouf
Copy link
Copy Markdown
Contributor Author

trenouf commented Oct 11, 2016

I think the failure is new.

@trenouf
Copy link
Copy Markdown
Contributor Author

trenouf commented Oct 11, 2016

The travis builds said:
io.ably.lib.test.realtime.RealtimeSuite > io.ably.lib.test.realtime.ConnectionManagerTest.connectionmanager_reconnect_default_endpoint FAILED
java.lang.AssertionError at ConnectionManagerTest.java:209

@paddybyers
Copy link
Copy Markdown
Member

Reviewing now, but if the failure is only on Travis we will merge this anyway pending further investigation. Travis seems to have trouble with tests where there is a dependency on things happening at specific times.

@trenouf
Copy link
Copy Markdown
Contributor Author

trenouf commented Oct 11, 2016

I still need to track down the first test failure I mentioned: connect_fail_suspended.

Copy link
Copy Markdown
Contributor Author

@trenouf trenouf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connect_fail_suspended failure:
Setting ConnectManager.transport earlier like this has the following effect: when my connect attempt fails with a host not found exception in the transport, I get a notifyState(disconnected) from WsClient. Then, right behind it, WebSocketTransport.onClose() calls notifyState(transport, disconnected). Before my change, the second one gets ignored because ConnectionManager.transport has not been set yet. Now, it does not get ignored, and it causes the ConnectionManager state to go back to disconnected after it has just changed to suspended.

So I need to figure out how to stop that happening whilst getting all the synchronisation right.

@paddybyers
Copy link
Copy Markdown
Member

But any transition is first vetted by checkSuspend(), which decides, based on how long since the last connection existed, whether to go into disconnected or suspended. Doesn't that still apply?

@trenouf
Copy link
Copy Markdown
Contributor Author

trenouf commented Oct 12, 2016

Good point, I need to work out why that isn't happening on the second one. So I guess you don't think there is any need to suppress the extra notification from the transport.

@trenouf
Copy link
Copy Markdown
Contributor Author

trenouf commented Oct 12, 2016

It only calls checkSuspend if it was in connecting state. But, on my second event, it's on suspended. I'll add code in handleStateChange to stop it transitioning from suspended to disconnected.

Tim Renouf added 3 commits October 12, 2016 08:41
I was getting problems where a transport that ConnectionManager had
abandoned due to timing out was, a bit later, generating an
onTransportUnavailable that was not ignored.

Fixed by setting ConnectionManager.transport in connectImpl so we can
tell whether events are from the right transport.

This fix leads to double disconnected notifications from the transport.
So I also needed to fix handleStateChange to stop a transition from
suspended to disconnected.

Also added a 1s delay in RealtimeConnectFailTest.connect_fail_suspended
to ensure that it detects the double disconnected bug.
Previously, it was determining whether to try and fall back, and thus
stay in "connecting" state, in checkSuspend, but not determining the
fallback host until connectImpl. The problem with that is that failure
to get a fallback host (the list has been exhausted, or the original
host is non-default so no fallback is done) resulted in "failure" state.

Now it determines the fallback host in checkSuspend, so it is in a
position to change to "disconnected" state if that fails.

This fix also ensures that lack of internet connectivity leads to
"disconnected" rather than "failed".
* Connection failure due to incorrect hostname or incorrect port should
  lead to "disconnected" state, not "failed" state, even after
  fallbacks.

* Lack of internet connectivity (which suppresses fallbacks) should lead
  to "disconnected" state, not "failed" state.

* Using a sandbox application key on production realtime gives "failed".

* The http host name gets updated to match the realtime one only when
  realtime connection succeeds, which means that none of the tests here
  for updating the http host can work.
@paddybyers
Copy link
Copy Markdown
Member

I'll add code in handleStateChange to stop it transitioning from suspended to disconnected.

Instead of having explicit code to handle different cases, is it not possible to have it handled uniformly? If handling the disconnection (or failure to connect) of a transport, then check whether to go into disconnected or suspended?

@trenouf
Copy link
Copy Markdown
Contributor Author

trenouf commented Oct 12, 2016

You mean call checkSuspend even when already in disconnected? I fixed the fallback problems by working out what to fall back to in there, so I would need to separate that out again if checkSuspend can be called multiple times for a disconnection or failure to connect.

@trenouf
Copy link
Copy Markdown
Contributor Author

trenouf commented Oct 12, 2016

Anyway, I've done another push.

@trenouf trenouf mentioned this pull request Oct 12, 2016
@paddybyers paddybyers merged commit 3bfedfc into master Oct 13, 2016
@paddybyers paddybyers deleted the tpr/fixfallback branch October 13, 2016 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants